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Abstract 

We have applied the optimal estimator for f^f^ 1 to the 5 year WMAP data. Marginalizing over the 
amplitude of foreground templates we get —4 < f^f 1 < 80 at 95% CL. Error bars of previous (sub- 
optimal) analyses are roughly 40% larger than these. The probability that a Gaussian simulation, 
analyzed using our estimator, gives a result larger in magnitude than the one we find is 7%. Our 
pipeline gives consistent results when applied to the three and five year WMAP data releases and 
agrees well with the results from our own sub-optimal pipeline. We find no evidence of any residual 
foreground contamination. 



1 Introduction 

It has become apparent that departures from Gaussianity of the primordial perturbations 
could shed light on the physics of inflation. In most inflationary models perturbations tend to 
be very close to Gaussian with possible observable departures only for the three-point function 
(or bispectrum). Single field models of inflation, in which the quantum fluctuations of the 
same field that dominates the energy density during inflation become the seeds for structure 
formation, satisfy a consistency relation that relates the shape of the three-point function to 
the dynamics of the inflaton field (eg. [H [2J [3]). These models produce a bispectrum that is 
either un-measurably small or, when large, of the so-called equilateral shape whose amplitude 
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is usually denoted by [U Ej. Testing the inflationary bispectrum could allow us to 

distinguish single field models from other alternatives, or measure the "sound-speed" (c s ) of 
perturbations during inflation [5], E]. Models where the fluctuations of a field other than the 
inflation seed the observed large scale structure, as it happens for example in some multi-field 
inflationary models [8], fall in a different class. These models produce a bispectrum of the 
so-called local type, whose amplitude peaks in the squeezed limit. This bispectrum encodes 
correlations between modes that exited the horizon at very different times during inflation 0. 
Large correlations of this type are forbidden in single field models. The amplitude of the 
bispectrum of the local type is usually parametrized by /]^£ al . 

The most recent search for non-gaussianity in CMB data was done by the WMAP team 
which used 5 years worth of data to put constraints on both the local and equilateral shapes. 
Their best estimates are: -9 < f l £f < 111 (95% CL) and -151 < < 253 (95% 

CL), both consistent with Gaussian initial conditions [ID] . These results are not without 
puzzles. Panel (a) of figure [I] compares the quoted WMAP results to those obtained earlier 
by a different group [IT], where a detection of non-gaussianity of the local type was claimed: 
27 < fSt < 147 (95% CL). 

A detection of local non-gaussianity would have profound consequences for our under- 
standing of Inflation, ruling out all single field inflation models. Thus it is important to 
understand what changed. The error bars in figure [TJ are dominated by the cosmic variance of 
the large scale modes so the shift seen between 3 and 5 years was not expected. Furthermore 
both analyses used basically the same method to constrain /]^£ al . 

Indeed if one compares [TD] and [TJJ for the same choice of analysis parameters (^ max = 500, 
using raw maps and the KpO mask), the shift in results is remarkable (panel (b) of figured]). 
While [ID] gets -4 < f§f x < 100, [HJ gets 25 < f§f x < 135 (95% CL). Notice that indeed 
the size of the 95% confidence interval has changed little, meaning that there is not that 
much additional information in the 5 year data set. The error bars are dominated by the 
"cosmic-variance" component which is common to both data sets as they are observing the 
same sky. The shift of the mean value between both analysis was dramatic (however, one 
should be careful when comparing these results, as they were obtained by different groups 
with slightly different ways of weighting the data). Did something change in the data? 

A natural worry when searching for deviations from Gaussianity is the effect of fore- 
grounds. The WMAP team in their 5 year papers advocates using foreground cleaned maps 
and a new mask, the KQ75 mask, which is larger than the KpO mask (the standard mask 
used in analysis of the 3 year data release). The KQ75 mask cuts out 4.9% more sky then 
KpO (ff k y° = 0.765 and f^.y = 0.716). The authors of [IT] on the other hand advocated 
looking at raw maps arguing that foregrounds appear to bias estimates of /^f 1 negative (as 
discussed later we do not agree with this conclusion). If foregrounds bias estimates negative 
then using raw maps results in a lower limit for f 1 ^ 1 which was found to be positive. It would 
then seem that using raw maps only strengthens the significance of the detection in [TT] . 



lr The same mechanism can be effective during the contracting phase of the new Ekpyrotic universe as well 
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Panel (c) of figure [TJ compares the effect of changing the mask when analyzing clean maps. 
Panel (d) of figure [JJ compares the effect of changing the mask when analyzing raw maps. 
Both sets of results are from [JU]. The first thing to note is that the choice of mask makes 
a difference, shifting for example the range from -4 < f£ c L al < 1 00 for raw maps with KpO 
to —17 < /jv/f 1 < 103 for raw maps with KQ75. Notice that the 95 % range increased by 
15%, much more than the expected 3.5% increase that results from a \[Uk y scaling of the 
error bars. Using cleaned vs raw maps had a more dramatic effect on the mean value of /jylf 1 . 
In the case of the KpO mask, the range changed from —4 < /Jy£ al < 100 for raw maps to 
9 < /]^£ al < 113 which exceeds zero at 95% CL. The excess is not signficant at 95% CL in 
the cleaned maps masked with KQ75 (—5 < < 115). Note that the increase in the error 
bars as one moves from KpO to KQ75 is in large part responsible for the decreased statistical 
significance of the excess. This increase of the error bars is directly related to the lack of 
optimality of the old algorithm, so the situation is not fully satisfactory. 
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Figure 1: Current constraints on f^^. Errors in this figure and throughout the paper are 2-a. 
Panel (a) best results from WMAP 5 years from the WMAP team [10] and WMAP 3 years 
from Yadav & Wandelt [TTJ together with the large scale structure results from Slosar et al 
[T5] and the results from this paper using our optimal method (OPT). Panel (b) comparison 
of [TU] and [UJ for the same choice of analysis parameters (l max — 500, raw maps and the KpO 
mask). Panels (c) and (d) show the effect of the mask for cleaned and raw maps respectively 
(from [10]). 

In summary, there was a large shift between the 5 and 3 year results of [10] and [TT] . 
Furthermore even within 5 year results masked with KQ75 foregrounds are still somewhat 
of an issue in that they change the results when comparing raw and cleaned maps. Thus to 
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correctly asses the significance of a non-gaussianity detection it would be preferable not just 
to use clean maps but to be more conservative and marginalize over foregrounds including 
the uncertainty in the foreground cleaning procedure into the final error bars. Finally the 
analysis method seems too sensitive to the choice of mask and that sensitivity accounts for 
part of the decrease in the significance. 

Given the importance of a detection of local non-Gaussinity it is imperative to improve 
the situation. We will do so by analyzing the data using the optimal estimator found in [TB] , 
with the implementation developed in [13], and by improving the treatment of foregrounds. 
Using the standard estimator results in error bars that are 40% larger than those obtained 
here. Our best estimate is -4 < f^f 1 < 80 at 95% CL. 

There is another probe of non-gaussianity that can compete with the CMB in terms of 
statistical power, the measurement of the scale dependence of the bias of large scale structure 
tracers [12]. The first result obtained using this technique —29 < /]^£ al < 70 is consistent 
with gaussianity (panel (a) of figured]) and has error similar to those obtained with the CMB 
|15j . It may be early days for this new probe, but current results at least disfavor a large 

f local 
JNL ■ 

If we combine the optimal WMAP5 result from this paper with the SDSS result from [TB] , 
we get — 1 < /]^2 al < 63 at 95% CL. Constraints on /]^2 al from other datasets currently have 
negligible statistical weight in comparison to WMAP5+SDSS, so this last result combines 
essentially all the data to date. 

In section [2] we will summarize our methods, in section Owe present our results, in sections 
IH and [5] we describe tests of the robustness of our results and we conclude in El We leave 
some technical details to the appendix. 



2 Summary of analysis methods 
2.1 Optimal analysis 

The optimal (i.e. minimum-variance) estimator for an arbitrary bispectrum B^£ 2 £ 3 was con- 
structed in [131 US] j building on previous work in [IT], and shown to contain both cubic and 
linear terms: 

(C 1 a)e 1 m 1 (C a)e 2rm {C a) i3m3 - 3C £i ^ iAm2 (C l a) t&m3 

(1) 

where M is a constant which normalizes the estimator to have unit response to . Here, 

ai m is assumed to be a noisy measurement of the CMB with signal + noise covariance C = 
(S + N). The C~ x filter appearing in Eq. ([!]) optimally weights the data in the presence of 
complications such as multiple data channels (with different beams), inhomogeneous noise, 
the sky cut, or modes of the data which we want to marginalize such as the monopole and 
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dipole @. 

Current estimates of f 1 ^ 1 from WMAP data [HI] E] do not use the optimal estimator, due 
to the implementational difficulty and CPU cost of the C~ l operation. Instead, a suboptimal 
estimator is defined by replacing C~ l in Eq. (prj) by a heuristically constructed filter. The 
error on /jyff 1 using the suboptimal estimator is about 40% larger than the optimal value. In 
this paper, we will present the first optimal analysis of WMAP data. The details of our C -1 
implementation are taken from [T^j and summarized in Appendix [A] 

In additon to reducing the error, the optimal estimator also has the advantage of elimi- 
nating a posteriori choices that can introduce bias or complicate interpretation of the results. 
This is particularly important in WMAP, where the evidence for nonzero /]^2 al currently has 
borderline statistical significance, and is sensitive to the choice of £ max - The suboptimal es- 
timator is not unique: different implementations can make different "arbitrary" choices of 
weighting in several places and as a result obtain different estimates of /Jy£ al . Furthermore, 
in both [TOldT] the uncertainty cr(/]°£ al ) decreases with £ max for £ max < 400 and then slightly 
increases for larger £ max , making it unclear what is the best-motivated choice of ^ max for a 
"bottom-line" estimate of /]^£ al . In constrast, the optimal estimator is unique and cr(/]v£ al ) is 
a decreasing function of £ max which eventually saturates. 

2.2 Foreground marginalization 

We use the WMAP foreground model: the total foreground contribution to the temperature 
at frequency v in pixel v is given by 

T fg (n) = &i(i/)T synch (n) + b 2 (u)T a {n) + 6 3 (i/)T dust (n) (2) 

where the functions bi(u) encode the frequency dependence of the foregrounds, and T sync h(n), 
Tff(n), Td us t(n) are spatial templates for synchrotron, free- free and dust emission. For more 
details, including construction of the spatial templates and the procedure for estimating bi{v), 
see (2D]. 

The WMAP data release includes "clean maps" which are obtained by subtracting 2~f g (n) 
from the "raw maps" which are directly observed in each channel. In [10], the /]^£ al estimator 
was applied to clean maps, assuming that any systematic error from foregrounds is small. 
(This assumption is tested by checking frequency dependence and dependence on the mask.) 
In [11], foregrounds were treated by applying the estimator to the raw maps, and assuming 

2 In reality, as shown in [16] and verified numerically in [18] , in the case of a significant detection of /Jv£ al , the 
estimator in eq. (1) becomes suboptimal, and a simple correction to the normalization has to be implemented 
to make it optimal again [16] . For the central value of /^£ al that we will find from our analysis of the WMAP 
5 yr data, this effect is quite irrelevant, affecting our error bars at the order of 10%, and we decide to neglect 
it. In particular, if we just want to determine if and at what statistical level a zero value of /^£ al is excluded, 
than the result of our estimator when applied to the data has to be compared against Gaussian simulations. 
In this case, our estimator is always optimal. Since no significant detection of /j^£ al has been made so far, 
this will be the approach taken in this paper. 



5 



that any bias due to foregrounds is negative. Under this assumption, the raw-map estimate 
is a lower bound on /]^£ al even in the presence of foregrounds. 

Using the optimal estimator, there is a third possibility for handling foregrounds: one can 
marginalize over the templates by modifying the noise covariance N so that each template 
Tj(n) is assigned infinite variance. The optimal estimator in Eq. ([I]) then estimates f l § c ^ x in a 
way which is "blind" to the amplitude of the template modes in the data. The variance of the 
estimator will be slightly increased to account for this loss of information. We marginalize 
the templates independently in each WMAP channel to avoid making any assumptions about 
the functions bi(u). Using the foreground-marginalized optimal estimator, direct template 
cleaning of the maps is not necessary: the f^™ 1 estimates from raw and clean maps will be 
the same. This estimator will be our default choice in the rest of the paper unless otherwise 
specified. 

3 Results 

Our best constraint on /]^2 al comes from our optimal analysis applied to WMAP 5 year data 
using the foreground marg inalization technique. We find: /]^£ al = (38 ± 21) at la. The 
primordial fluctuations are consistent with Gaussian, —4 < /jv^f 1 < 80 at 95% CL. It is 
important to point out that our analysis of the WMAP data results in error bars that are 
smaller than previous analyses as a result of our using the optimal estimator. This can be 
clearly seen in Figure [2] where we directly compare optimal and suboptimal estimators for 
otherwise the same choices of analysis parameters. Note that the optimal estimator is already 
better at even relatively large scales, but it gets substantially better for large values of £ max - 
Given the importance of a detection of non-Gaussianity it is important to understand how 
robust our results are to various choices of analysis parameters and data. We explore these 
issues in the next two sections. 

4 WMAP 5 vs WMAP 3 

The differences between the results of [TU] and [UJ might lead to the suspicion that something 
changed in the data between the 5 and 3 year data release. 

When we analyze the 3-year dataset with the optimal foreground-marginalized estimator, 
we find /]^£ al = (58 ± 23). Thus, between the 3-year and 5- year datasets, we find a shift 
A/]^2 al = — 20 in the value of J]^£ al . This may seem too large to be a statistical event, given 
that the error on f 1 ^ 1 is not much better in the 5-year data than in the 3-year data. What 
is the cause of this shift, and is it consistent with statistics? 

There are four differences between the 3-year and 5-year datasets which are relevant for our 
/j^f 1 analysis: different maps (3 years of data vs 5), different foreground mask (KpO vs KQ75), 
different beams, and different best-fit cosmological parameters. We find that the changes to 
the beams and cosmological parameters have a negligible effect on . In Figure [3], we 
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Figure 2: Constraints on using 5-year data and KQ75 mask, using both the optimal 

estimator (squares) and the old estimator applied to clean maps (triangles). The top panel 
shows cumulative results (constraints using all the information up to a given £) while the 
bottom one shows contributions from separate £ bins. Our overall /]^2 al estimate, taking 
^max = 750, is (38 ± 21) for the optimal estimator and (55 ± 33) for suboptimal. 
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compare the changes due to the updated maps and updated mask. Splitting the overall 
estimate into independent £ bins, it is seen that the change to /]^£ al is mainly coming from 
£ ps 450 where it is mostly due to the updated maps. 
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Figure 3: Estimates of using the optimal foreground-marginalized estimator with 3-year 
data and mask (squares), 5- year data with 3- year mask (triangles), and 5- year data and mask 
(circles). We have shown the contributions from separate £ bins; the overall estimates of /]^£ al 
obtained by summing all bins are (58 ± 23), (37 ± 21) and (38 ± 21) respectively. 

As a test for systematics, we generated "paired" simulations of the 3-year and 5-year 
datasets. Each pair consists of a 3-year simulation and a 5-year simulation which share 
the same CMB realization. The noise realization in each 5-year simulation is constructed by 
combining the noise realization from the corresponding 3- year simulation with an independent 
noise realization corresponding to 2 years of integration time, in a way which mimics the way 
the data from different years is combined in the real WMAP data. We find that the RMS 
Af^jf- between the 3-year simulations and the 5-year simulations is 16, so the shift observed 
in the data is within statistics. The same is true for each individual £ bin in Figure [3] We 
conclude that there is nothing dramatically different in the two data sets. 

This comparison between WMAP3 and WMAP5 assumes the optimal estimator. If the 
suboptimal estimator is used instead, we also find an RMS A/]^£ al between 3-year and 5-year 
which is equal to 16, so the difference between the WMAP3 result reported in [IT] and the 
WMAP5 result reported in [TO] (/^ al = 87 ± 30 and /]^ al = 58 ± 36 respectively, for large 
4nax) is marginally consistent with being a statistical event. 

One puzzling feature of the 3- year dataset is the large value of /^£ al reported in [11] at 
£max = 750, compared to estimates at £ max = 350 which had been reported previously [2T] [22] . 
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Using our pipeline, we see a change of A/]^2 al = 20 (optimal estimator) or A/]^£ al = 32 
(suboptimal estimator) between these values of £ max , a less dramatic shift than the A/]^£ al = 
52 change reported in [TT] . In simulation we find that the RMS change in /j^ al between these 
values of £ max is ~ 19, so it appears difficult to interpret a change as large as 52 as a statistical 
event. (We find an RMS change ~19 for both the optimal or suboptimal estimator, and with 
either 3-year or 5-year data, so it seems to be a robust quantity.) 

There also appears to be some systematic tendency for our pipeline to produce lower /]^£ al 
values, compared to the 3- year results of [TT], at large £ max (Figure HI top panel). If we take 
4nax = 750 and apply the suboptimal estimator to 3-year raw maps for comparison with [TT], 
we get f l §f x = 69 ± 30 (disfavoring /#£ al = at 2.3a), whereas f l §f x = 87 ± 30 (disfavoring 
JnT 1 = at 2.9cr) was reported in [TT]. The reason for this disagreeemnt is unclear, but may 
simply be the result of making different choices of weighting in the suboptimal estimator. This 
is good motivation for using the optimal estimator, which is unique and therefore different 
implementations should agree precisely. 

The agreement between our pipeline and the 5- year results from [10] is better (Figure HI 
bottom panel). Furthermore we have internally compared the non-optimal pipeline in this 
paper with the one we used in [2TJ and found them to agree at the percent level. Both 
pipelines were independently developed. 



5 Foregrounds 

5.1 Large-scale galactic foregrounds 

Perhaps the most worrying systematic effect in this analysis is non-Gaussian contamination 
by foregrounds. Let us denote the foreground-marginalized optimal estimator by f^Li and 
the optimal estimator constructed without marginalizing foregrounds by f% L - We can get a 
crude idea of how important foregrounds are, at the order-of-magnitude level, by comparing 
the raw-map value of f NL , the clean-map value of f NL , and the value of }nl- (As described in 
§2.21 /jvl gives the same value when applied to raw or clean maps.) In the five-year dataset 
with KQ75 mask, we find that f^i agrees well with clean), and /^ L (raw) is larger by 
mlO (Figure ED . 

This result suggests that foreground contamination is mild if the KQ75 mask is used, 
but also raises a puzzle. In [TT] it was argued that foregrounds always make a negative 
contribution to /j^f 1 even in a single realization, so that the raw-map estimate can be taken 
as a lower bound on the true value of /]^ al . (For example, the systematic error due to 
foregrounds outside the KpO mask is quoted as a "one-sided" range lg.) However, with the 
optimal estimator and 5-year dataset, we find that (/j\r L (raw) — /j\r L (clean)) is positive. Is this 
a sign that something is wrong with the estimator, or the foreground model? What general 
statement can we make about the sign of (/^r L (raw) — f^ L (clean))? 



9 



150- 



[2c 



100 



50- 



■ ■ 3-year, optimal foreground-marginalized 

▲ ▲ 3-year, suboptimal raw-map 

• • 3-year, suboptimal raw-map from [11] 

(iT -1 



100- 



50- 



■ ■ 5-year, optimal foreground-marginalized 

a a 5-year, suboptimal clean-map 

• • 5-year, suboptimal clean-map from [10] 



400 



500 



600 



700 



800 



Figure 4: Top panel: Comparison between 3- year results reported in [TT] and results obtained 
from our pipeline, using either the optimal or suboptimal estimator. We apply the suboptimal 
estimator to 3-year raw maps for consistency with [TT]. Bottom panel: Comparison between 
5-year results (optimal estimator, raw maps) reported in (TO] and results obtained from our 
pipeline using the optimal or suboptimal estimator. We apply the suboptimal estimator to 
5-year clean maps for consistency with [TO] . 
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Figure 5: Comparison between the foreground-marginalized optimal estimator (/zvx), and the 
optimal estimator without foreground marginalization (/jyx) applied to either raw or clean 
5-year maps with KQ75 mask. 

The raw maps, clean maps, and foreground maps are related (with C~ l filter applied) by: 

{C- l a)Z = {C- l a)f™ + (C^a)* (3) 
Using Eq. (JTJ), we can write (/jvx(raw) — f% L (clean)) as the sum of three terms: 

S L (raw) - ft L (clean) = (FTT) + (FFT) + (FFF) (4) 
where we have defined: 

(C~ l a){l %i {C^a)l m2 {C- l a)tZ 

{C- l a){° mi {C- l a)l m2 {C- l a)l m3 (5) 

(Note that we have chosen to include the linear term in the FTT piece. This is the most 
natural choice since it ensures that (FTT) = 0, where the expectation value is taken over 
random CMB realizations with the foreground template fixed.) 

For the WMAP foreground model and five-year dataset, we find (FTT) = 10.4, (FFT)=- 
0.1, and (FFF)=-0.2: the shift in /]^2 al between the raw and clean maps is entirely due to 
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the (FTT) term. This term represents accidental correlation between large-scale foregrounds 
and small-scale non-foreground power, and is equally likely to be positive or negative. We 
conclude that it is not safe to assume that (/^(raw) — f% L (clean)) is negative. The value 
depends on the way that the small-scale modes are filtered and can be different for the optimal 
and suboptimal estimators. 

In this paper, we will treat large-scale foregrounds by marginalizing the template ampli- 
tudes in the optimal estimator as described in §2.21 When we use the suboptimal estimator, 
we will simply analyze clean maps and neglect the (small) extra uncertainty in f x § c ^ due to 
uncertainty in the template amplitudes. In principle, one could estimate f 1 ^ 1 from raw maps 
and treat the zero-mean (FTT) term as systematic error from foregrounds, by increasing the 
uncertainty c(/]v£ a1 )- However, assuming the WMAP foreground model, we note that the 
size of the (FTT) term in the WMAP data is 10.4, whereas the RMS value in simulation is 
4.0. (The simulations were constructed by evaluating (FTT) in Eq. (j^D using a random CMB 
realization and the WMAP foreground templates.) The larger value seen in the data can be 
interpreted as a test for foreground contamination that is failing at 2.5a, but it is unclear 
how to interpret this further. 



5.2 Small-scale galactic foregrounds 

The WMAP foreground templates are smoothed with a 1° beam and therefore the template- 
marginalization procedure does not remove foregrounds on small angular scales. This may 
contaminate the /^£ al estimator, which is a cross-correlation between long-wavelength tem- 
perature (£ pa 20) and small-scale power (£ pa 350). Such contamination, if present, is not 
included in the analysis from the preceding subsection, in which it is assumed that the tem- 
plates agree perfectly with the real foregrounds. 

We can roughly estimate the contamination due to uncleaned small-scale galactic fore- 
grounds in the following way. The dust template is available at high resolution [23], so we 
can define a "clean+" map for each WMAP channel by: 

T clcan+ (n) = T raw (n) - 6x(i/)3^£(n) - b 2 (v)Tp th (n) - b 3 (v)T^(n) (6) 

where "smth" denotes a foreground template smoothed with a 1° beam, and "hires" denotes 
a template smoothed with the instrumental beam. The clean and clean+ maps agree on large 
scales, but on small scales, foregrounds have been partially subtracted in the clean+ maps. We 
find that the Z^f 3,1 estimates from clean and clean+ maps are negligibly different (A/]^£ al < 
0.75 in all £ bands). Our clean+ maps only include small-scale foreground contributions 
from dust; however, dust is expected to be the largest small-scale foreground in W-band, and 
comparable to the other foregrounds in V-band. (We checked this using the MEM maps from 
the WMAP 5-year release.) This suggests that the impact of small-scale galactic foregrounds 
is negligible. 
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5.3 Point sources 



The largest source of small-scale foreground power in WMAP is from unresolved point sources, 
mainly radio sources. If point sources are assumed isotropic and unclustered, it has been 
shown [IT] that the /]^£ al contamination is negligible. However, the clustering of sources in 
WMAP is not well characterized, and the number density of unresolved sources may be larger 
near the galactic plane, and is certainly larger in the ecliptic plane, where the larger noise 
level in WMAP makes it harder to detect and mask sources. For this reason, we would like 
to do a direct test for point source contamination by deliberately weakening the point source 
mask and comparing the value of fjyl . 

A second concern, pointed out in [ID] , is that the WMAP source detection procedure has 
a lower effective flux threshhold in regions where the local CMB temperature is higher than 
average. This may negatively correlate the level of unresolved point source power to the CMB 
temperature and fake a positive- /^£ al signal. To address this concern at the same time, we 
construct a "KQ75-CW" mask by leaving the galactic part of the KQ75 mask unchanged, 
but replacing the source mask by one constructed from the Chen & Wright [21] catalog, 
in which only difference maps between WMAP frequences are used to detect sources. Our 
KQ75-CW mask contains ~40% as many sources as the KQ75 mask (this is largely due to 
the use of sources from external catalogs in KQ75 [25]), but the unresolved sources should be 
uncorrelated to the CMB. 

Applying the optimal foreground-marginalized estimator to the five-year data using both 
the KQ75 and KQ75-CW masks, we find that the difference in f 1 ^ 1 is very small (A/]^ al < 2; 
the precise value depends on £ max )- We conclude that any bias due to correlations between 
the KQ75 point source mask and the CMB is negligible. This test also suggests (but does 
not prove) that any bias due to unresolved point sources is small, since most of the sources 
masked by KQ75 are not masked by KQ75-CW. The systematic error from unresolved sources 
was calculated by the WMAP team in [TU] using Monte Carlo simulations, and found to be 
small compared to the statistical error. 



5.4 Other tests for foreground contamination 

In FigureEJ we compare the optimal foreground-marginalized estimator using five-year V-band 
data, W-band data, and the combined (V+W) result shown previously. The three cases agree 
well at low £ where the data is signal-dominated, and deviate somewhat at high £ where the 
noise realizations in V-band and W-band are independent. The differences between V-band 
and W-band are consistent with simulation, i.e. no evidence is seen for a frequency-dependent 
signal. 

Because foregrounds in WMAP are most important on the largest scales (particularly 
£ = 2), another test we can do for foreground contamination is to fix £ max = 750 and vary 
the minimum multipole £ m i n that is used to estimate /]^£ al . One of the most striking results 
reported in [II] is that even with £ min = 20, where about half the statistical weight is lost, 
evidence for positive /^£ al is still seen in three-year raw maps at high significance: /]^£ al = 
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Figure 6: Comparison between V-band data, W-band data, and (V+W) combined, using the 
optimal foreground-marginalized estimator, five-year data, and KQ75 mask. We show the 
contributions to /]^£ al from independent I bins. 

135 ± 48 at la. We do not see such a signal, finding J]^£ al = 48 ± 56 (suboptimal estimator, 
raw maps) or f 1 ^ 1 = 50 ± 51 (optimal estimator, foreground-marginalized) using three-year 
data, KpO mask, £ min = 20 and £ mSuX = 750. Results for five-year data are shown in Figured 
The statistical significance of nonzero /]^£ al stays roughly constant out to £ min ~ 6, and then 
decreases. 



6 Summary 

We have applied the optimal estimator of [13], with the implementation developed in [14"] . 
to the 5 year WMAP data. Our results are summarized in Table [H Marginalizing over the 
amplitude of foreground templates we get —4 < /]^2 al < 80 at 95% CL. Error bars of previous 
analysis are roughly 40% larger than these. The probability that a Gaussian simulation, 
analyzed using our estimator, gives a result larger in magnitude than the one we find is 7%. 

We did extensive tests of our results including implementing our own sub-optimal estima- 
tor to compare with published results. We concluded that: 

• The optimal estimator outperforms the sub-optimal one both on large and small angular 
scales. 

• The differences we see between the results of our pipeline when applied to the three and 
five year WMAP data releases are consistent with being statistical fluctuations. 
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• • 5-year kq75, optimal foreground-marginalized 
o 5 10 15 20 

Figure 7: /]^£ al estimates with varying minimum multipole £ m i n , using the optimal foreground- 
marginalized estimator, five-year data, KQ75 mask, and taking £ max = 750 throughout. 

• Our implementation of the sub-optimal estimator is in good agreement with the results 
obtained by the WMAP team in [10] . We see see some small but significant discrepancies 
with the results of Yadav & Wandelt [11] on small angular scales. We see no signifi- 
cant differences in excess of noise between the results of our optimal and sub-optimal 
estimators. 

• The fluctuations we see between results at different £ bands are consistent with being 
statistical. 

• After foreground template marginalization we do not see any evidence of foregrounds 
affecting our results in a significant way. We see no evidence for frequency dependence 
in our results and our constraints are robust to the choice of minimum £ value allowed 
in the triangles. 

• The foreground contamination present outside of KQ75 in the raw maps seem to create 
a fake non-Gaussian signal that shifts the mean value of up by about half a a. This 
happens through an accidental correlation between the CMB and foreground signals. 
This term does not have a definite sign, it is realization dependent. We conclude that 
using raw maps instead of foreground marginalized maps does not generally guarantee 
getting a conservative lower limit on /^ff 1 . 

• The lack of small scale power in the foreground templates does not appear to bias our 
results in any noticeable way. Our results are robust to the choice of point source mask. 
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Table 1: Estimated values of f^ c L , with la errors, for various choices of dataset, mask, and 
foreground cleaning procedure used throughout this paper. 
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A Implementation of the optimal estimator 

In this appendix, we describe some implementational details of our analysis pipeline. 
A.l C 1 implementation 

We represent the WMAP data as a length-A^p ix data vector di and an (A jr pix )-by-(A' r pi x ) noise 
covariance matrix iVj, for each WMAP channel i = 1, 2, . . . , A^an- in the WMAP noise model, 
different pixels are uncorrelated, i.e. each Ni is a diagonal matrix. We represent the CMB 
realization in harmonic space by a length- iV a i m vector a, where N & \ m is the number of linearly 
independent multipoles a^ m such that £ < £ max = 1000. For each channel, we introduce 
an iVpix-by-iVaim matrix Aj which combines the beam convolution and spherical transform 
operations, so that we can write: 

di = Aia + m (7) 

where the noise rii is a length- A^p^ vector. We write the signal covariance corresponding to the 
fiducial power spectrum as an A^m-by-A^m matrix S (thus (aa T ) = S and (riinj) = NiSij). 
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Given the data in Eq. ([7J), there is an optimal (minimum- variance) map a [26], which is 
an unbiased estimator of the CMB realization a: 

a = a + r] (8) 
(W) = N (9) 

The optimal map a and its A^im-by-A^im noise covariance A" are jointly defined by: 

an 

N-'a = ^AfiV- 1 ^ (10) 
i=i 

A r ch an 

N- 1 = E^^A (11) 

i=l 

Going from the data in Eq. ([7j) to the optimal map a does not lose information, so one can 
think of the data as if the CMB realization a were directly observed in harmonic space, with 
noise covariance N defined by Eq. (jlip . 

To evaluate the optimal f 1 ^ 1 estimator (Eq. (JTJ), we need to compute (S + N) -1 ^. For- 
mally, this is straightforward using Eqs. (fTUl) . (fTTT) . but in practice there are two computational 
obstacles: 

1. The WMAP resolution is too large for dense (A r p ix )-by-(A r pi x ) linear algebra (or (A^im)- 
by-(A r a i m ) linear algebra) to be computationally feasible. 

2. The inverse noise covariance A^" 1 will usually not be invertible. 

To explain the second problem better, we note that in our pipeline, we represent the sky 
cut by assigning infinite noise to the pixels which are masked. Therefore the inverse noise 
covariance A^ _1 in pixel space is a non-invertible diagonal matrix (the entries corresponding 
to masked pixels are zero). Going to harmonic space, the operator A^ _1 and the vector N~ l a 
in Eqs. ( TTUj) . (fTTT) are still defined, but A^ and a generally will not be, because A^ -1 is not 
invertible. 

To solve the first problem, we first observe that there is a computationally efficient pro- 
cedure for multiplying a length- N a \ m vector by the matrix A^" 1 . This follows from Eq. (11 II) . 
since multiplication by the operator Ai (or Aj) can be done using a fast spherical transform, 
and multiplication by A^" 1 is trivial because Ni is a diagonal matrix in the WMAP noise 
model. The next step is to write 

(S' + Af)- 1 a = S , - 1 (S- 1 + Ar- 1 )- 1 Ar- 1 a (12) 

Since we can compute N -1 ^ using Eq. (jlOp . the only missing ingredient is an efficient pro- 
cedure for multiplying a vector by the matrix (S^ 1 + A^ 1 )^ 1 . Note that we have already 
described an efficient procedure for performing the "forward" operation (S~ x + A^" 1 ). Given 
such a procedure, conjugate gradient inversion [27] is a well-known iterative method for per- 
forming the inverse operation (S" -1 + A^ 1 ) -1 which avoids direct matrix inversion. Obtaining 
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rapid convergence with conjugate gradient inversion usually depends on constructing a good 
preconditioner, or approximate inexpensive inverse operation. Using the multigrid precondi- 
tioner from [TJJ], the computational cost of each C~ l operation is about 10 CPU-minutes for 
the WMAP5 V+W dataset. 

This solution to the first problem above (infeasibility of dense linear algebra) also solves 
the second problem (noninvertibility of iV _1 ), since evaluating the right-hand side of Eq. f|T2|) 
by conjugate gradient inversion only requires us to compute N~ l a, and to multiply vectors 
by A -1 . In fact, we have deliberately written Eq. f[T2l in such a way that {A, a} have been 
eliminated in favor of {N" 1 , A" 1 ^}. 

Finally, we describe our implementation of template marginalization. We always marginal- 
ize templates in pixel space, and do the marginalization independently for each WMAP chan- 
nel. Some results in this paper include foreground marginalization, i.e. we have marginalized 
three modes corresponding to the synchrotron, free-free and dust foregrounds. In addition, 
all results which use the optimal estimator include marginalization of the four modes cor- 
responding to the monopole and dipole. Formally, let r be an A tmp i-by-A P i x matrix which 
contains the pixel-space templates. If we denote the pixel-space inverse noise covariance with 
and without template marginalization by A^ -1 and A" 1 respectively, then the two are related 
by: 

Nr 1 = lim [Ni + r/r^]- 1 = A" 1 - N^^N^t^tN^ (13) 
77— +00 

The method described above for computing C~ 1 a only requires us to have a procedure for 
multiplying a vector by Nf . This is straightforward using the right-hand side of Eq. (ITBl . 
since A; is diagonal and the rest of the matrices are small enough that dense linear algebra 
is computationally feasible. 

A. 2 ffif x estimator 

The preceding subsection describes our algorithm for computing C _1 a, where (i£ m is a minimum- 
variance map made by optimally combining all WMAP channels. The optimal f^^ 1 estimator 
is obtained from this as follows: 

Tnl = -TfBhhh ( * Z J 3 ) (C^^ermAC ~ 1 a)^m 2 (C" 1 a)^ m3 -3C^^ u e 2m2 (C~ 1 a) i3 

yv \ /Til ''*2 ''H / L 

(14) 

where is the local bispectrum normalized to /]$x al = 1. 

We have written the estimator in harmonic space where it is simplest, but in practice the 
cubic term is evaluated efficiently in position space using the KSW construction [T7]. The 
linear term is obtained as a Monte Carlo average following 



(15) 



mi 777-2 fTl3 
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where {■)§ denotes an average over signal+noise simulations s. 

The normalizing constant M appearing in our pipeline is computed using non-Gaussian 
simulations as described in [H], where some computational speedups for the cubic and linear 
terms are also presented. This end-to-end normalization ensures that the estimator is unbiased 
(i.e. (/jvx) = /jv2 al ) without making any approximations. 

We compute the error c"(/]^2 al ) by Monte Carlo. In each simulation, we randomly generate 
a CMB realization a and a noise realization in each channel, then process the simulation in 
the same way as the real data. Thus in our pipeline, the same set of Monte Carlo simulations 
is used to compute three quantities: the linear term in the estimator, the normalization J\f, 
and the error cr(/^2 al ). 

When we report estimates of f 1 ^ 1 using the suboptimal estimator (typically for purposes 
of direct comparison with [TUl [TT]). we define the suboptimal estimator by replacing the 
C~ l a in Eq. (j!5p by a heuristically-constructed map which is linear in the WMAP data and 
approximates C~ 1 a. This requires making arbitrary choices in several places (e.g. relative 
weighting of different channels) and so two implmentations of "the" suboptimal estimator will 
give different results. We have attempted to follow [10] as closely as possible. More precisely, 
our suboptimal estimator is obtained by replacing C~ l a in Eq. ffTol) by the quantity denoted 
by a hn in Eq. (A27) of pTO]. 

In principle, our estimator has some nonzero response to the "equilateral" three-point 
signal introduced in [13], and to various three-point correlations between late-universe 
anisotropies such as ISW, point sources, gravitational lensing, and thermal SZ. This nonzero 
response can be corrected by jointly estimating /Jy£ al in combination with additional three- 
point signals (Z^ 11 , f^ L , ■ ■ •), but we do not do so here, and simply use Eq. (TTJJ directly. This 
makes it straightforward to compare results with [10], [11] , where the same approach was used. 
Including joint estimation of f^ 11 should not appreciably change the results, since the cross- 
correlation between the local and equilateral shapes is small, and the equilateral shape is not 
detected in WMAP. Some secondary contributions to f^L have been studied and have all been 
predicted to be small compared to the WMAP statistical error [T^l [28l [29l [30l [3T1 [32l [33l [3^j , 
although it is not clear that all possible secondaries have been studied. 

Throughout this paper, we have shown the ^-dependence of our /j^£ al estimates in two 
ways: either by plotting a cumulative /jv/f 1 estimate versus £ max (e.g. top panel of Fig. [2]), or 
by defining bins in £ and reporting an independent estimate of /]^2 al in each bin (e.g. bottom 
panel of Fig. [2]). In the second case, the /]^£ al estimate for a bin [£ , £i] is defined by restricting 
the sum in the estimator (Eq. ffl^|) ) to triples (£i, £2, £3) which satisfy £q < max(£i, £ 2 , £3) < £\. 
In implementation, it is convenient to note that the binned and unbinned estimators are 
related by: 

f NL (£o,£i) = (16) 

where fuL^oAi) denotes the binned estimator and jf/vL^max = U) denotes the unbinned es- 
timator with normalization constant Mi- (Note that the normalization constant M appearing 
in Eq. (TTJJ depends on £ max .) 
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